Q: Which puzzle-building strategy is best, and how good is it?
Start with edge pieces
Sort and build high contrast/color regions
Sort into knob-and-hole combinations
How to measure puzzle strategy success/failure?
time to completion
average # tries to fitting piece
proportion of first tries that fit
Defining terms
model selection is “estimating the performance of different models in order to choose the best one”
identify the best puzzle building strategy from among the candidates on a relative basis
model assessment is, “having chosen a final model, estimating its prediction error…on new data”
measuring how good your chosen puzzle building strategy is
@hastie.elements
Two main considerations
Using data honestly
Measuring error
Using data honestly
Notation
Each observation’s outcome \(Y\) (continuous or categorical) is to be predicted with function of auxiliary knowledge, i.e. covariates, from that observation \(X=x\), denoted as \(\hat Y(x)\)
Want \(\hat Y(x)\) to be close to \(Y\), but need to define what is meant by “close”
Use ‘mean squared prediction error’ (MSPE) for now: \((Y - \hat Y(x))^2\)
Except for true and null models, all MSPEs increase
Firth model has smallest MSPE in validation subset
An aside: even though it has the smallest MSPE, is it still good in an absolute sense? What MSPE do we get from using \(\hat Y(x_i)\equiv 0.5\)?
Assuming we report firth model as the model, is this the MSPE we should expect in the future?
Simulation study
\(n = 200;200;200\) observations in training;validation;testing
Same generating model as previously but repeat 500 times
In each simulated dataset, only three models are taken to the testing step: the true and null models (for benchmarking) and whichever other model has best validation MSPE
observed_results <- all_results %>%group_by(sim) %>%#keep from the test step only the method we would have selectedfilter(model_name %in%c("(truth)","null") | step !="testing"| mspe_ranking ==min(mspe_ranking)) %>%ungroup();